Overview

Brought to you by YData

Dataset statistics

Number of variables16
Number of observations183
Missing cells15
Missing cells (%)0.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory100.4 KiB
Average record size in memory562.0 B

Variable types

Text1
DateTime4
Categorical4
Numeric7

Dataset

DescriptionAIDS Clinical Trials Group 019 - HIV treatment trial
CreatorRP2 Clinical Data Harmonization Project
URLHEAT Research Projects

Variable descriptions

study_sourceSource study identifier
CD4 cell count (cells/µL)CD4+ T lymphocyte count - immune function indicator
HIV viral load (copies/mL)HIV RNA copies per mL - treatment efficacy marker
Albumin (g/dL)Serum albumin - liver function and nutritional status
primary_datePrimary date of measurement/visit
Age (at enrolment)Patient age at study enrollment

Alerts

ALT (U/L) is highly overall correlated with AST (U/L) and 4 other fieldsHigh correlation
AST (U/L) is highly overall correlated with ALT (U/L)High correlation
Patient ID is highly overall correlated with ALT (U/L) and 4 other fieldsHigh correlation
coordinate_source is highly overall correlated with ALT (U/L) and 5 other fieldsHigh correlation
month is highly overall correlated with coordinate_source and 2 other fieldsHigh correlation
original_record_index is highly overall correlated with ALT (U/L) and 4 other fieldsHigh correlation
season is highly overall correlated with Patient ID and 4 other fieldsHigh correlation
year is highly overall correlated with ALT (U/L) and 5 other fieldsHigh correlation
ALT (U/L) is highly imbalanced (59.1%)Imbalance
AST (U/L) has 15 (8.2%) missing valuesMissing
original_record_index is uniformly distributedUniform
original_record_index has unique valuesUnique
harmonization_date has unique valuesUnique
Hematocrit (%) has 26 (14.2%) zerosZeros

Reproduction

Analysis started2025-11-11 10:31:57.210747
Analysis finished2025-11-11 10:33:35.251591
Duration1 minute and 38.04 seconds
Software versionydata-profiling vv4.17.0
Download configurationconfig.json

Variables

Distinct100
Distinct (%)54.6%
Missing0
Missing (%)0.0%
Memory size13.2 KiB
2025-11-11T12:33:35.639372image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length17
Median length17
Mean length17
Min length17

Characters and Unicode

Total characters3111
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique19 ?
Unique (%)10.4%

Sample

1st rowHEAT_A5FA56CF6DEE
2nd rowHEAT_369244B23FC3
3rd rowHEAT_25491365061F
4th rowHEAT_384C0702B398
5th rowHEAT_384C0702B398
ValueCountFrequency (%)
heat_384c0702b3984
 
2.2%
heat_108fa716a13b2
 
1.1%
heat_83514d5014292
 
1.1%
heat_f4459fca5b342
 
1.1%
heat_e494373177812
 
1.1%
heat_702c7faec79a2
 
1.1%
heat_e3b7af63e12b2
 
1.1%
heat_0849de6632812
 
1.1%
heat_b87315faeee72
 
1.1%
heat_7436c2f045e02
 
1.1%
Other values (90)161
88.0%
2025-11-11T12:33:36.798824image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E326
 
10.5%
A307
 
9.9%
H183
 
5.9%
T183
 
5.9%
_183
 
5.9%
0164
 
5.3%
1162
 
5.2%
D160
 
5.1%
F155
 
5.0%
7148
 
4.8%
Other values (9)1140
36.6%

Most occurring categories

ValueCountFrequency (%)
(unknown)3111
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
E326
 
10.5%
A307
 
9.9%
H183
 
5.9%
T183
 
5.9%
_183
 
5.9%
0164
 
5.3%
1162
 
5.2%
D160
 
5.1%
F155
 
5.0%
7148
 
4.8%
Other values (9)1140
36.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown)3111
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
E326
 
10.5%
A307
 
9.9%
H183
 
5.9%
T183
 
5.9%
_183
 
5.9%
0164
 
5.3%
1162
 
5.2%
D160
 
5.1%
F155
 
5.0%
7148
 
4.8%
Other values (9)1140
36.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown)3111
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
E326
 
10.5%
A307
 
9.9%
H183
 
5.9%
T183
 
5.9%
_183
 
5.9%
0164
 
5.3%
1162
 
5.2%
D160
 
5.1%
F155
 
5.0%
7148
 
4.8%
Other values (9)1140
36.6%

primary_date
Date

Primary date of measurement/visit

Distinct40
Distinct (%)21.9%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
Minimum2005-05-31 00:00:00
Maximum2007-06-12 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2025-11-11T12:33:37.242853image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:33:38.326036image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=40)

year
Categorical

High correlation 

Distinct3
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size10.9 KiB
2006
130 
2007
38 
2005
15 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters732
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2005
2nd row2005
3rd row2005
4th row2005
5th row2005

Common Values

ValueCountFrequency (%)
2006130
71.0%
200738
 
20.8%
200515
 
8.2%

Length

2025-11-11T12:33:39.463486image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-11T12:33:39.994251image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
2006130
71.0%
200738
 
20.8%
200515
 
8.2%

Most occurring characters

ValueCountFrequency (%)
0366
50.0%
2183
25.0%
6130
 
17.8%
738
 
5.2%
515
 
2.0%

Most occurring categories

ValueCountFrequency (%)
(unknown)732
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0366
50.0%
2183
25.0%
6130
 
17.8%
738
 
5.2%
515
 
2.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown)732
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0366
50.0%
2183
25.0%
6130
 
17.8%
738
 
5.2%
515
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown)732
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0366
50.0%
2183
25.0%
6130
 
17.8%
738
 
5.2%
515
 
2.0%

month
Real number (ℝ)

High correlation 

Distinct11
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.5355191
Minimum1
Maximum11
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.9 KiB
2025-11-11T12:33:40.516182image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q14
median7
Q39
95-th percentile11
Maximum11
Range10
Interquartile range (IQR)5

Descriptive statistics

Standard deviation2.7169787
Coefficient of variation (CV)0.415725
Kurtosis-0.89092223
Mean6.5355191
Median Absolute Deviation (MAD)2
Skewness-0.096538685
Sum1196
Variance7.3819732
MonotonicityNot monotonic
2025-11-11T12:33:40.842938image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
729
15.8%
525
13.7%
1025
13.7%
424
13.1%
818
9.8%
615
8.2%
1112
6.6%
912
6.6%
29
 
4.9%
38
 
4.4%
ValueCountFrequency (%)
16
 
3.3%
29
 
4.9%
38
 
4.4%
424
13.1%
525
13.7%
615
8.2%
729
15.8%
818
9.8%
912
6.6%
1025
13.7%
ValueCountFrequency (%)
1112
6.6%
1025
13.7%
912
6.6%
818
9.8%
729
15.8%
615
8.2%
525
13.7%
424
13.1%
38
 
4.4%
29
 
4.9%

season
Categorical

High correlation 

Distinct4
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size11.3 KiB
Winter
62 
Autumn
57 
Spring
49 
Summer
15 

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters1098
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWinter
2nd rowWinter
3rd rowAutumn
4th rowAutumn
5th rowAutumn

Common Values

ValueCountFrequency (%)
Winter62
33.9%
Autumn57
31.1%
Spring49
26.8%
Summer15
 
8.2%

Length

2025-11-11T12:33:41.407339image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-11T12:33:42.044338image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
winter62
33.9%
autumn57
31.1%
spring49
26.8%
summer15
 
8.2%

Most occurring characters

ValueCountFrequency (%)
n168
15.3%
u129
11.7%
r126
11.5%
t119
10.8%
i111
10.1%
m87
7.9%
e77
7.0%
S64
 
5.8%
W62
 
5.6%
A57
 
5.2%
Other values (2)98
8.9%

Most occurring categories

ValueCountFrequency (%)
(unknown)1098
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n168
15.3%
u129
11.7%
r126
11.5%
t119
10.8%
i111
10.1%
m87
7.9%
e77
7.0%
S64
 
5.8%
W62
 
5.6%
A57
 
5.2%
Other values (2)98
8.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown)1098
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n168
15.3%
u129
11.7%
r126
11.5%
t119
10.8%
i111
10.1%
m87
7.9%
e77
7.0%
S64
 
5.8%
W62
 
5.6%
A57
 
5.2%
Other values (2)98
8.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown)1098
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n168
15.3%
u129
11.7%
r126
11.5%
t119
10.8%
i111
10.1%
m87
7.9%
e77
7.0%
S64
 
5.8%
W62
 
5.6%
A57
 
5.2%
Other values (2)98
8.9%

Hemoglobin (g/dL)
Real number (ℝ)

Distinct7
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.95082
Minimum9
Maximum15
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.9 KiB
2025-11-11T12:33:42.748197image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum9
5-th percentile10
Q111
median12
Q313
95-th percentile14
Maximum15
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.1826525
Coefficient of variation (CV)0.098959948
Kurtosis0.13407273
Mean11.95082
Median Absolute Deviation (MAD)1
Skewness0.095980498
Sum2187
Variance1.3986669
MonotonicityNot monotonic
2025-11-11T12:33:43.083595image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1258
31.7%
1348
26.2%
1147
25.7%
1015
 
8.2%
147
 
3.8%
155
 
2.7%
93
 
1.6%
ValueCountFrequency (%)
93
 
1.6%
1015
 
8.2%
1147
25.7%
1258
31.7%
1348
26.2%
147
 
3.8%
155
 
2.7%
ValueCountFrequency (%)
155
 
2.7%
147
 
3.8%
1348
26.2%
1258
31.7%
1147
25.7%
1015
 
8.2%
93
 
1.6%

Hematocrit (%)
Real number (ℝ)

Zeros 

Distinct20
Distinct (%)10.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.448087
Minimum0
Maximum58
Zeros26
Zeros (%)14.2%
Negative0
Negative (%)0.0%
Memory size2.9 KiB
2025-11-11T12:33:43.507965image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q110
median30
Q345
95-th percentile55
Maximum58
Range58
Interquartile range (IQR)35

Descriptive statistics

Standard deviation19.038027
Coefficient of variation (CV)0.69360122
Kurtosis-1.3592414
Mean27.448087
Median Absolute Deviation (MAD)15
Skewness-0.038707168
Sum5023
Variance362.44647
MonotonicityNot monotonic
2025-11-11T12:33:43.853581image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
026
14.2%
4519
10.4%
1515
8.2%
2513
 
7.1%
1013
 
7.1%
3013
 
7.1%
3513
 
7.1%
4013
 
7.1%
5013
 
7.1%
512
 
6.6%
Other values (10)33
18.0%
ValueCountFrequency (%)
026
14.2%
32
 
1.1%
512
6.6%
1013
7.1%
1515
8.2%
162
 
1.1%
204
 
2.2%
242
 
1.1%
2513
7.1%
3013
7.1%
ValueCountFrequency (%)
586
 
3.3%
573
 
1.6%
558
4.4%
532
 
1.1%
5013
7.1%
482
 
1.1%
4519
10.4%
432
 
1.1%
4013
7.1%
3513
7.1%
Distinct47
Distinct (%)25.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.6535519
Minimum2.2
Maximum10.7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.9 KiB
2025-11-11T12:33:44.313984image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum2.2
5-th percentile2.71
Q13.6
median4.3
Q35.55
95-th percentile7.5
Maximum10.7
Range8.5
Interquartile range (IQR)1.95

Descriptive statistics

Standard deviation1.5826935
Coefficient of variation (CV)0.3401044
Kurtosis2.8535255
Mean4.6535519
Median Absolute Deviation (MAD)0.9
Skewness1.348183
Sum851.6
Variance2.5049186
MonotonicityNot monotonic
2025-11-11T12:33:44.774867image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=47)
ValueCountFrequency (%)
4.210
 
5.5%
3.99
 
4.9%
3.68
 
4.4%
5.27
 
3.8%
4.37
 
3.8%
3.27
 
3.8%
4.97
 
3.8%
3.77
 
3.8%
3.17
 
3.8%
4.66
 
3.3%
Other values (37)108
59.0%
ValueCountFrequency (%)
2.22
 
1.1%
2.54
2.2%
2.62
 
1.1%
2.72
 
1.1%
2.84
2.2%
2.93
1.6%
34
2.2%
3.17
3.8%
3.27
3.8%
3.34
2.2%
ValueCountFrequency (%)
10.72
1.1%
10.62
1.1%
82
1.1%
7.62
1.1%
7.54
2.2%
7.12
1.1%
6.73
1.6%
6.52
1.1%
6.42
1.1%
6.32
1.1%

ALT (U/L)
Categorical

High correlation  Imbalance 

Distinct2
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size10.7 KiB
1.0
168 
2.0
 
15

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters549
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
1.0168
91.8%
2.015
 
8.2%

Length

2025-11-11T12:33:45.471713image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-11T12:33:45.958620image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
1.0168
91.8%
2.015
 
8.2%

Most occurring characters

ValueCountFrequency (%)
.183
33.3%
0183
33.3%
1168
30.6%
215
 
2.7%

Most occurring categories

ValueCountFrequency (%)
(unknown)549
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
.183
33.3%
0183
33.3%
1168
30.6%
215
 
2.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown)549
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
.183
33.3%
0183
33.3%
1168
30.6%
215
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown)549
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
.183
33.3%
0183
33.3%
1168
30.6%
215
 
2.7%

AST (U/L)
Real number (ℝ)

High correlation  Missing 

Distinct78
Distinct (%)46.4%
Missing15
Missing (%)8.2%
Infinite0
Infinite (%)0.0%
Mean58.233929
Minimum36.8
Maximum76.2
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.9 KiB
2025-11-11T12:33:46.434417image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum36.8
5-th percentile41.2
Q151.95
median58.8
Q365.775
95-th percentile72.77
Maximum76.2
Range39.4
Interquartile range (IQR)13.825

Descriptive statistics

Standard deviation9.5483129
Coefficient of variation (CV)0.16396477
Kurtosis-0.70805753
Mean58.233929
Median Absolute Deviation (MAD)7.05
Skewness-0.22172206
Sum9783.3
Variance91.170279
MonotonicityNot monotonic
2025-11-11T12:33:46.915391image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
39.24
 
2.2%
59.44
 
2.2%
45.64
 
2.2%
544
 
2.2%
504
 
2.2%
54.74
 
2.2%
71.64
 
2.2%
65.74
 
2.2%
48.43
 
1.6%
70.13
 
1.6%
Other values (68)130
71.0%
(Missing)15
 
8.2%
ValueCountFrequency (%)
36.82
1.1%
39.24
2.2%
40.52
1.1%
41.22
1.1%
42.91
 
0.5%
43.52
1.1%
43.82
1.1%
45.52
1.1%
45.64
2.2%
45.82
1.1%
ValueCountFrequency (%)
76.21
 
0.5%
75.42
1.1%
74.61
 
0.5%
74.42
1.1%
742
1.1%
73.41
 
0.5%
71.64
2.2%
712
1.1%
70.42
1.1%
70.13
1.6%

Patient ID
Real number (ℝ)

High correlation 

Distinct100
Distinct (%)54.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1110185.9
Minimum1110031
Maximum1110289
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.9 KiB
2025-11-11T12:33:47.524897image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum1110031
5-th percentile1110040.1
Q11110132.5
median1110194
Q31110251.5
95-th percentile1110283
Maximum1110289
Range258
Interquartile range (IQR)119

Descriptive statistics

Standard deviation75.721443
Coefficient of variation (CV)6.8206094 × 10-5
Kurtosis-0.86496361
Mean1110185.9
Median Absolute Deviation (MAD)61
Skewness-0.48213463
Sum2.0316402 × 108
Variance5733.7369
MonotonicityIncreasing
2025-11-11T12:33:48.154208image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11100374
 
2.2%
11100882
 
1.1%
11100472
 
1.1%
11100402
 
1.1%
11101492
 
1.1%
11101572
 
1.1%
11101582
 
1.1%
11100802
 
1.1%
11100792
 
1.1%
11101392
 
1.1%
Other values (90)161
88.0%
ValueCountFrequency (%)
11100311
 
0.5%
11100351
 
0.5%
11100361
 
0.5%
11100374
2.2%
11100391
 
0.5%
11100402
1.1%
11100411
 
0.5%
11100441
 
0.5%
11100472
1.1%
11100481
 
0.5%
ValueCountFrequency (%)
11102891
0.5%
11102882
1.1%
11102862
1.1%
11102852
1.1%
11102842
1.1%
11102832
1.1%
11102812
1.1%
11102792
1.1%
11102782
1.1%
11102771
0.5%

original_record_index
Real number (ℝ)

High correlation  Uniform  Unique 

Distinct183
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean91
Minimum0
Maximum182
Zeros1
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size2.9 KiB
2025-11-11T12:33:48.712697image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile9.1
Q145.5
median91
Q3136.5
95-th percentile172.9
Maximum182
Range182
Interquartile range (IQR)91

Descriptive statistics

Standard deviation52.971691
Coefficient of variation (CV)0.58210649
Kurtosis-1.2
Mean91
Median Absolute Deviation (MAD)46
Skewness0
Sum16653
Variance2806
MonotonicityStrictly increasing
2025-11-11T12:33:49.435338image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
0.5%
11
 
0.5%
21
 
0.5%
31
 
0.5%
41
 
0.5%
51
 
0.5%
61
 
0.5%
71
 
0.5%
81
 
0.5%
91
 
0.5%
Other values (173)173
94.5%
ValueCountFrequency (%)
01
0.5%
11
0.5%
21
0.5%
31
0.5%
41
0.5%
51
0.5%
61
0.5%
71
0.5%
81
0.5%
91
0.5%
ValueCountFrequency (%)
1821
0.5%
1811
0.5%
1801
0.5%
1791
0.5%
1781
0.5%
1771
0.5%
1761
0.5%
1751
0.5%
1741
0.5%
1731
0.5%

harmonization_date
Date

Unique 

Distinct183
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
Minimum2025-08-11 15:41:05.869140
Maximum2025-08-11 15:41:05.904148
Invalid dates0
Invalid dates (%)0.0%
2025-11-11T12:33:50.074436image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:33:51.013902image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct40
Distinct (%)21.9%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
Minimum2005-05-31 00:00:00
Maximum2007-06-12 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2025-11-11T12:33:51.985414image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:33:53.068880image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=40)

coordinate_source
Categorical

High correlation 

Distinct2
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size12.3 KiB
JHB_ACTG_021
151 
JHB_ACTG_019
32 

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters2196
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJHB_ACTG_019
2nd rowJHB_ACTG_019
3rd rowJHB_ACTG_019
4th rowJHB_ACTG_019
5th rowJHB_ACTG_019

Common Values

ValueCountFrequency (%)
JHB_ACTG_021151
82.5%
JHB_ACTG_01932
 
17.5%

Length

2025-11-11T12:33:54.277944image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-11T12:33:54.846712image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
jhb_actg_021151
82.5%
jhb_actg_01932
 
17.5%

Most occurring characters

ValueCountFrequency (%)
_366
16.7%
J183
8.3%
H183
8.3%
B183
8.3%
A183
8.3%
C183
8.3%
T183
8.3%
G183
8.3%
0183
8.3%
1183
8.3%
Other values (2)183
8.3%

Most occurring categories

ValueCountFrequency (%)
(unknown)2196
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
_366
16.7%
J183
8.3%
H183
8.3%
B183
8.3%
A183
8.3%
C183
8.3%
T183
8.3%
G183
8.3%
0183
8.3%
1183
8.3%
Other values (2)183
8.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown)2196
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
_366
16.7%
J183
8.3%
H183
8.3%
B183
8.3%
A183
8.3%
C183
8.3%
T183
8.3%
G183
8.3%
0183
8.3%
1183
8.3%
Other values (2)183
8.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown)2196
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
_366
16.7%
J183
8.3%
H183
8.3%
B183
8.3%
A183
8.3%
C183
8.3%
T183
8.3%
G183
8.3%
0183
8.3%
1183
8.3%
Other values (2)183
8.3%
Distinct40
Distinct (%)21.9%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
Minimum2005-05-31 00:00:00
Maximum2007-06-12 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2025-11-11T12:33:55.660595image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:33:56.763394image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=40)

Interactions

2025-11-11T12:33:22.233140image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:31:59.015840image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:06.825844image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:15.911700image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:24.545612image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:33.497298image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:42.045332image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:33:22.802527image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:31:59.240283image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:07.311765image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:16.356822image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:25.040679image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:33.983979image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:47.495620image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:33:23.539549image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:31:59.699570image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:07.845056image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:16.942072image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:25.674646image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:34.598481image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:53.289007image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:33:24.209762image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:00.148902image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:08.433079image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:17.414516image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:26.289047image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:35.151985image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:58.686514image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:33:24.932252image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:00.621987image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:09.068730image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:18.014277image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:26.837638image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:35.792715image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:33:04.244989image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:33:25.607285image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:01.114622image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:09.694293image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:18.568161image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:27.478136image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:36.311560image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:33:09.052561image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:33:31.105000image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:06.265066image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:15.233873image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:23.883579image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:32.802034image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:32:41.356933image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-11-11T12:33:16.717311image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

2025-11-11T12:33:57.695855image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ALT (U/L)AST (U/L)Hematocrit (%)Hemoglobin (g/dL)Patient IDWhite blood cell count (×10³/µL)coordinate_sourcemonthoriginal_record_indexseasonyear
ALT (U/L)1.0001.0000.2070.3230.6770.0000.6200.3810.8520.2380.997
AST (U/L)1.0001.000-0.1510.279-0.0500.2720.2150.080-0.0500.2840.356
Hematocrit (%)0.207-0.1511.000-0.2570.1600.1020.1840.0620.1600.1580.204
Hemoglobin (g/dL)0.3230.279-0.2571.000-0.1250.0670.1550.091-0.1250.1370.229
Patient ID0.677-0.0500.160-0.1251.000-0.0590.6770.2061.0000.6680.679
White blood cell count (×10³/µL)0.0000.2720.1020.067-0.0591.0000.0810.031-0.0590.2100.114
coordinate_source0.6200.2150.1840.1550.6770.0811.0000.7270.8520.6750.657
month0.3810.0800.0620.0910.2060.0310.7271.0000.2060.9830.590
original_record_index0.852-0.0500.160-0.1251.000-0.0590.8520.2061.0000.7450.909
season0.2380.2840.1580.1370.6680.2100.6750.9830.7451.0000.544
year0.9970.3560.2040.2290.6790.1140.6570.5900.9090.5441.000

Missing values

2025-11-11T12:33:31.839113image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.
2025-11-11T12:33:33.761543image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

anonymous_patient_idprimary_dateyearmonthseasonHemoglobin (g/dL)Hematocrit (%)White blood cell count (×10³/µL)ALT (U/L)AST (U/L)Patient IDoriginal_record_indexharmonization_datevisit_datecoordinate_sourceprimary_date_parsed
1520HEAT_A5FA56CF6DEE2005-06-0120056Winter13.025.02.92.0NaN11100310.02025-08-11T15:41:05.8691402005-06-01JHB_ACTG_0192005-06-01
1521HEAT_369244B23FC32005-06-0120056Winter12.057.06.22.0NaN11100351.02025-08-11T15:41:05.8693762005-06-01JHB_ACTG_0192005-06-01
1522HEAT_25491365061F2005-05-3120055Autumn13.015.06.22.0NaN11100362.02025-08-11T15:41:05.8696032005-05-31JHB_ACTG_0192005-05-31
1523HEAT_384C0702B3982005-05-3120055Autumn12.030.04.62.0NaN11100373.02025-08-11T15:41:05.8698142005-05-31JHB_ACTG_0192005-05-31
1524HEAT_384C0702B3982005-05-3120055Autumn12.030.04.62.0NaN11100374.02025-08-11T15:41:05.8700202005-05-31JHB_ACTG_0192005-05-31
1525HEAT_384C0702B3982005-06-0120056Winter12.030.04.62.0NaN11100375.02025-08-11T15:41:05.8702252005-06-01JHB_ACTG_0192005-06-01
1526HEAT_384C0702B3982005-06-0120056Winter12.030.04.62.0NaN11100376.02025-08-11T15:41:05.8704292005-06-01JHB_ACTG_0192005-06-01
1527HEAT_577544911E152005-06-0220056Winter15.00.05.32.0NaN11100397.02025-08-11T15:41:05.8706482005-06-02JHB_ACTG_0192005-06-02
1528HEAT_F4459FCA5B342005-05-3120055Autumn15.025.03.92.0NaN11100408.02025-08-11T15:41:05.8708522005-05-31JHB_ACTG_0192005-05-31
1529HEAT_F4459FCA5B342005-06-0120056Winter15.025.03.92.0NaN11100409.02025-08-11T15:41:05.8710552005-06-01JHB_ACTG_0192005-06-01
anonymous_patient_idprimary_dateyearmonthseasonHemoglobin (g/dL)Hematocrit (%)White blood cell count (×10³/µL)ALT (U/L)AST (U/L)Patient IDoriginal_record_indexharmonization_datevisit_datecoordinate_sourceprimary_date_parsed
1693HEAT_6F07D7CBD31B2007-05-1020075Autumn11.045.02.81.052.71110283173.02025-08-11T15:41:05.9024272007-05-10JHB_ACTG_0212007-05-10
1694HEAT_8A27E9EFC9C12007-05-1020075Autumn11.048.05.21.069.11110284174.02025-08-11T15:41:05.9026242007-05-10JHB_ACTG_0212007-05-10
1695HEAT_8A27E9EFC9C12007-05-1020075Autumn11.048.05.21.069.11110284175.02025-08-11T15:41:05.9028132007-05-10JHB_ACTG_0212007-05-10
1696HEAT_546B865942372007-05-1020075Autumn12.045.05.61.061.21110285176.02025-08-11T15:41:05.9030062007-05-10JHB_ACTG_0212007-05-10
1697HEAT_546B865942372007-05-1020075Autumn12.045.05.61.061.21110285177.02025-08-11T15:41:05.9031952007-05-10JHB_ACTG_0212007-05-10
1698HEAT_307294D069C22007-05-1120075Autumn13.00.04.31.052.21110286178.02025-08-11T15:41:05.9033842007-05-11JHB_ACTG_0212007-05-11
1699HEAT_307294D069C22007-05-1120075Autumn13.00.04.31.052.21110286179.02025-08-11T15:41:05.9035812007-05-11JHB_ACTG_0212007-05-11
1700HEAT_15C6FF1F74F92007-05-1120075Autumn12.040.04.71.059.81110288180.02025-08-11T15:41:05.9037712007-05-11JHB_ACTG_0212007-05-11
1701HEAT_15C6FF1F74F92007-05-1120075Autumn12.040.04.71.059.81110288181.02025-08-11T15:41:05.9039602007-05-11JHB_ACTG_0212007-05-11
1702HEAT_945EB2014C3C2007-06-1220076Winter12.035.03.71.065.41110289182.02025-08-11T15:41:05.9041482007-06-12JHB_ACTG_0212007-06-12